riow to create a fully transparent 
MongoDB database cluster holding 
terabytes of data serving hundreds 
of millions of users simultaneously 
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Co-organizer 
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Who are we? 
A creative platform where people can 


creat remix share 
e 


photo video 
S S 

One of the largest open-source 

content collections in the world 


The app is available in 30 languages 


World's largest digital creative platform 
and a top 20 most downloaded app 


Picsart milestones (so far) 


2011 year founded 
x 800* edits every second 
1B* edits per month 


180 countries with active Picsart creators 


150M* monthly active users 


m \l 
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30 supported languages 


Make your milestone 


Long story short 


The reason behind the creation of 
Picsart? 


Did Picsart change, or no? 


What to expect in the future? 


Implications of the 


changes 


New microservices |  - 


Connections MEN 
between ! 
microservices 


Data size -- ------- (——— — 


New features 


-Structural changes 


Flow changes 


Challenges 


Do we have time to analyze our job result? 


Should we still work reactively? Or go 
proactive? 


How can we predict issues? 


How we can prevent issues? 


What will we learn after every incident / issue? 


Data flow 
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Google Cloud Platform 
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Query lifecycle issues 


Microservice 1 Microservice 2 
og mE od? 
O Ò © O Ò © | 
| Query lifecycle 
a $ o Should take 1 X MS 
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Microservice 3 


What's the core problem 
Easy way - Blame each other 


Hard way - Work together 


Engineers are 
engineers 
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Plan together Google CloudPatform 
Work together ás kafka af alnazon 


Fix all issues 
togetner 
Party ir too 


4 Or be ready for a 
disaster 


Database queries 


1. Collect all queries from databases Shardl = Shard2 Shard3 Shard Shard 
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2. Temporary store them somewhere Ph" W S g <-> 
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3. Filter all sensitive and unnecessary 77 m að ws > > 
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4. Move to search and analytics engine 5" SSS w w 
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5. Create dashboards secondar Na w w = = 
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1. Collect all queries from databases 


Replication factor 


Primary, read replicas, analytics 


replicas 


How to keep query consistency? 


4-9 nodes 


1 master, 2-6 read replicas, 1-2 analytics 
replicas 


Collect all queries from all nodes, except 


analytics 
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(> {> {> y = = E j 
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1. Collect all queries from databases 


Cont'd 
How to collect the data right way? Kafka connector or mongoexport? 
How to read all queries and where mongoexport from all nodes 
from? 
Where to transfer them? Kafka OnPrem is the best option 
How to transfer them? Filebeat 
) mongoDB. ás kafka 
Primar Secondar 


CS 239 Producer Draf Draf Draf 
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mongoexpo mongoexpo 
rt rt 


2. Temporary store them 


3. Filter all sensitive data 
$$ kafka 


Keep all data in kafka draft topic 


Prepare a deployment to filter 


>| Final 


data 


Autoscale deployment 


Producer Consumer 


Filter data and put in final topic 


A 


3000..9 


Analyze data and alert 


4. Move to search and analytics 
engine 
5. Create dashbodfdetre 


Transfer all data in ES via 


logstash 


Create dashboards to have 


Logstash 


Give access to all relevant 


engineers 


Mongod-profiling 


Diagram of query collection 
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